A Computational Lexicon of Contemporary Hebrew
نویسندگان
چکیده
Computational lexicons are among the most important resources for natural language processing (NLP). Their importance is even greater in languages with rich morphology, where the lexicon is expected to provide morphological analyzers with enough information to enable them to correctly process intricately inflected forms. We describe the Haifa Lexicon of Contemporary Hebrew, the broadest-coverage publicly available lexicon of Modern Hebrew, currently consisting of over 20,000 entries. While other lexical resources of Modern Hebrew have been developed in the past, this is the first publicly available large-scale lexicon of the language. In addition to supporting morphological processors (analyzers and generators), which was our primary objective, the lexicon is used as a research tool in Hebrew lexicography and lexical semantics. It is open for browsing on the web and several search tools and interfaces were developed which facilitate on-line access to its information. The lexicon is currently used for a variety of NLP applications.
منابع مشابه
A Finite-State Morphological Grammar of Hebrew
Morphological analysis is a crucial component of several natural language processing tasks, especially for languages with a highly productive morphology, where stipulating a full lexicon of surface forms is not feasible. This paper describes HAMSAH (HAifa Morphological System for Analyzing Hebrew), a morphological processor for Modern Hebrew, based on finite-state linguistically motivated rules...
متن کاملWord Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System
We present a constituency parsing system for Modern Hebrew. The system is based on the PCFG-LA parsing method of Petrov et al. (2006), which is extended in various ways in order to accommodate the specificities of Hebrew as a morphologically rich language with a small treebank. We show that parsing performance can be enhanced by utilizing a language resource external to the treebank, specifical...
متن کاملLanguage resources for Hebrew
We describe a suite of standards, resources and tools for computational encoding and processing of Modern Hebrew texts. These include an array of XML schemas for representing linguistic resources; a variety of text corpora, raw, automatically processed and manually annotated; lexical databases, including a broad-coverage monolingual lexicon, a bilingual dictionary and a WordNet; and morphologic...
متن کاملVerb Morphology of Hebrew and Maltese — Towards an Open Source Type Theoretical Resource Grammar in GF
One of the first issues that a programmer must tackle when writing a complete computer program that processes natural language is how to design the morphological component. A typical morphological component should cover three main aspects in a given language: (1) the lexicon, i.e. how morphemes are encoded, (2) orthographic changes, and (3) morphotactic variations. This is in particular challen...
متن کاملMultifunctional Computational Lexicon of Contemporary Portuguese: An Available Resource for Multitype Applications
This paper presents some aspects of the first Portuguese frequency lexicon extracted from a corpus of large dimensions. The Multifunctional Computational Lexicon of Contemporary Portuguese (henceforth MCL) rised from the necessity of filling a gap existent in the studies of the contemporary Portuguese. Until recently, the frequency lexicons of Portuguese were of very small dimensions, such as P...
متن کامل